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NASA  related  Roadmaps 


C.  L.  Liu,  “Scheduling  algorithms  for  multiprocessors  in  a  hard  real-time  environment,”  JPL  Space 
Programs  Summary,  pp.  37-60,  1969: 
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NASA  related  Roadmaps 


NASA/TM-2013-217986/REV1 ,  Flight  Avionics  Hardware  Roadmap,  Avionics  Steering 
Committee,  January  2014: 

page  (i): 

“The  ASC’s  specific  recommendations  for  near-term  investments  are:  ...  Rad  Hard  Multicore 
Processor” 

page  34: 

“CD07:  Advanced  COTS-Based  Instrument  Processor... As  a  follow  on  to  CD3,  this  C&DH 
subsystem  will  utilize  future  generations  of  COTS  devices.” 


Steering  Committee  for  NASA  Technology  Roadmaps;  National  Research  Council  of  the  National 
Academies,  “NASA  Space  Technology  Roadmaps  and  Priorities:  Restoring  NASA's  Technological 
Edge  and  Paving  the  Way  for  a  New  Era  in  Space”: 

page  S-7  and  S-8  in  section  “TOP  TECHNICAL  CHALLENGES”: 

“C9)  Improved  Flight  Computers:  Develop  advanced  flight-capable  devices  and  system 
software  for  real-time  flight  computing  with  low  power,  radiation-hard  and  fault-tolerant 
hardware  that  can  be  applied  to  autonomous  landing,  rendezvous  and  surface  hazard 
avoidance. 
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Flight  control 

Feedback  controller 

1 .  Sleep  untibthe 


2.  Read  sensor 

3.  Compute  actuation 
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4.  Actuate  command 


~  Software  Engineering  Institute 


Carnegie  Mellon  University 


Multicore  Real-Time  Scheduling 
May  18,  2015 

©2015  Carnegie  Mellon  University 


Flight  control 
Feedbac 


ontroller 

1 .  Sleep  untibthe 
right  timer 

2.  Read  sensor 

3.  Compute  actuation 
command 

4.  Actuate  command 
5  Go  to  1 .  'JfL 


The  delay  must 
be  at  most 
x  milliseconds 


-  Software  Engineering  Institute 


Carnegie  Mellon  University 


Multicore  Real-Time  Scheduling 

May  18,  2015  g 

©2015  Carnegie  Mellon  University 


Question 
How  tbvi 


erify  timing  of  software 


1.  Challenges  in 

2.  Our  track  reel 


verifying  timing  of  software 

id 

snges  injteTifying  timing  of; 


-  Software  Engineering  Institute 


Carnegie  Mellon  University 


Multicore  Real-Time  Scheduling 
May  18,  2015 

©2015  Carnegie  Mellon  University 


Challenges  in  verifying  timing  of  software 


Software  Engineering  Institute 


Carnegie  Mellon  University 


Multicore  Real-Time  Scheduling 
May  18,  2015 

©2015  Carnegie  Mellon  University 


Challenge  1 :  One  processor,  many  threads 


Thread  1 


Thread  2 
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Challenge  1 :  One  processor,  many  threads 
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Challenge  1 :  One  processor,  many  threads 


*- 


Thread  1  arrives  Deadline  of  Thread  1  time 

t  I 

Thread  2  arrives  Deadline  of  Thread  2 


t  J 

Good  idea:  priority  of  a  thread  is  a  function  of  its  deadline. 

Deadline-Monotonic  (DM) 
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Challenge  2:  Priority  inversion,  critical  sections 


Thread  1  and  Thread  3  use 
critical  section  S 

Assign  priorities  so  a 
thread  with  short  deadline 
has  high  priority  (DM) 
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Challenge  2:  Priority  inversion,  critical  sections 
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Challenge  2:  Priority  inversion,  critical  sections 
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This  situation  almost  caused  a  mission  failure  of  an 
autonomous  system  (see  NASA  Mars  Pathfinder  1997). 
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Challenge  3:  Memory  interference  in  multicore 


Let  us  consider  a  system  with  a  single  processor  first. 
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Challenge  3:  Memory  interference  in  multicore 


Processors  share  memory  bus 
Processors  share  last-level  cache 
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that  Thread  2  brought  into  the 
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Challenge  3:  Memory  interference  in  multicore 
processors 
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can  cause  a  deadline  miss. 
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Challenge  4:  Execution  overruns 
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Challenge  4:  Execution  overruns 
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If  a  thread  executes  for  longer  than  its  believed  worst-case  execution 

time,  then  a  deadline  may  be  missed. 
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Challenge  5:  Mode  change 
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Challenge  5:  Mode  change 
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Challenge  5:  Mode  change 
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We  need  to  prove  that  Thread  1  does  not  miss  a  deadline 
during  the  transition  from  Mode  1  to  Mode  2. 
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Challenge  1,2,3: 

Previous  work  on  single  processor  Our  work  on  multiprocessor 
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Priority  ceiling  protocol  and  First  analysis  of  priority 

priority  inheritance  protocol  inheritance  protocol  for  (global) 

multiprocessor  (RTSS’09) 

First  method  for  analyzing  contention 
on  memory  bus  (RTSS-WIP’09) 

First  coordinated  cache  and  bank 
coloring  (ICESS’13) 

First  method  for  analyzing  contention 
on  memory  bus  considering  bank 
sharing  (RTAS’14) 
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Challenge  4,5: 


First  implementation  of  mixed-criticality  scheduler  in  OS-kernel 
VX  Works,  under  evaluation  by  NASA 

First  locking  protocol  for  mixed-criticality  scheduling 
(RTAS’11) 

First  mode  change  protocol  and  analysis  for  EDF 
(OPODIS’08) 

First  mode  change  protocol  with  mode-independent  tasks  on  a 
multiprocessor 
(ECRTS’11) 
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Timing  challenges  specific  to  autonomous  systems 
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Challenge  6:  The  execution  time  of  a  thread  is 
highly  variable. 

Challenge  7:  A  thread  may  not  even  terminate. 

Challenge  8:  The  environment  is  unknown  and 
hence  the  number  of  events  that  the  software 
needs  to  process  is  not  known  before  run-time. 

Challenge  9:  The  execution  of  the  software 
depends  on  the  physical  world  and  the  physical 
world  depends  on  the  software. 
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Sharing  of  Multiple  Hardware  Resources 


Need  of  Coordinated  Protection 


Need  to  constrain  interference  through  each  resource  type 

•  CPU  cycles 

•  Cache 

•  Memory  Banks 

•  Memory  Bus  /  inter-core  network 

Ensure  no  inconsistent  configuration 

•  Configuration  for  one  resource  does  not  invalidate  configuration  of 
another 
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Cache  Partitioning  (Coloring) 

Main  Mem 


Bank  Partitioning  (Coloring) 

Main  Mem 


16  15 


14  13  12 


Cache  and  Bank  Address  Bits 


Cache  Index 


E.g.  2  bank  bits 
2  cache  bits 
1  shared  bit 
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Row-Bank  Address  Bit  Xoring  Improves 
Coverage 

If  two  additional  bits  are  xor  with  bank  bits  we  can 
get  all  combinations 


Bank  Colors 


Cache 

Colors 
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Coordinated  Cache  and  Bank  Partitioning  & 
Core  Allocation 

Avoid  conflicting  color  assignments 

Take  advantage  of  different  conflict  behaviors 

•  Banks  can  be  shared  within  same  core  but  not  across  cores 

•  Cache  cannot  be  shared  within  or  across  cores 

•  Coordinated  core  and  bank  color  allocation 

Take  advantage  of  sensitivity  of  execution  time  to  cache 

•  Task  with  highest  sensitivity  to  cache  is  assigned  more  cache 

•  Diminishing  returns  taken  into  account 

Two  algorithms  explored 


t  w  i  a  m  RTiTui  ■lire  r#  hi  n  an  man 
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Implementation  of  Cache+Bank  Coloring 


Linux  /  RK  :  Kernel  Memory  Manager 

Memory  reserves  with  set  of  bank  and  cache  colors 

Pages  are  classified  in  cache  and  bank  colors 

Added  to  resource  sets  that  are  attached  to  multiple  processes/threads 
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Experimental  Results 
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Limited  Number  of  Private  Partitions 


Private  partitions  significantly  reduces  usable  memory 

•  Number  of  bank/cache  cells  in  memory 

•  Number  of  cells  =  (B*H).  Size  of  cell  C  = 

•  With:  M  =  size  of  memory,  B=  #  bank  colors,  H  =  #  cache 
colors 

•  E.g.  Intel  core  i7  2600 

•  M  =  4GB,  B  =  1 6,  H  =  32  C  =  =  8 MB 

’  ’  16*32 

•  Private  partitions  =  one  cell  per  cache  color  &  one  cell  per  bank 
color 

•  Number  of  private  partitions  PP  =  min(B,  H ) 

•  E.g.  Intel  core  i7  2600  :  PP  -  min(16,32)  =  16 


•  Extreme  (using  all  private  partitions)  total  usable  private  partition 

_ momnrx/ _ 
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Allowing  Sharing 


In  Partitioned  Scheduling  OK  to  share  banks  within  core 

•  Number  of  banks  are  no  longer  a  restriction:  PP  =  H 

•  Partitions  sharing  banks  in  a  core 

•  #  Sets  of  independent  partitions  I  -  N  ;  N  =  number  of  cores 

•  Memory  utilization  (uniform  partitions)  =  — 

•  Intel  Core  i7  2600: 1  -  N  -  4 

4GB 

•  Memory  utilization  (uniform  partitions)  =  —  =  1GB  =  25% 

Need  better  utilization 

Partitions  may  not  be  enough  for  number  of  tasks 
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Predictable  Sharing 


Exploit  different  sensitivity 
Bounding  interference 
Policing  and  enforcement 
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Bank  Partitioning  (Coloring)  +  Timing  Analysis 


Explicitly  considers  the  timing  characteristics  of  major 
DRAM  resources 

•  Rank/bank/bus  timing  constraints  (JEDEC  standard) 

•  Request  re-ordering  effect 


Bounding  memory  interference  delay  for  a  task 

■  Combines^quest-driven  and  iob^driven  approaches 

’s  own  memory  requests  Inierferina  memory  ream 


Task 


xerfering  memory  requests 
during  the  job  execution 


Software  DRAM  bank  partitioning  awareness 

•  Analyzes  the  effect  of  dedicated  and  shared  DRAM  banks 
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Response-Time  Test 


•  Memory  interference  delay  cannot  exceed  any  results  from 
the  RD  and  JD  approaches 

-  We  take  the  smaller  result  from  the  two  approaches 


Extended  response-time  test 
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Classical  iterative  response-time  test 
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Approach 


Job-Driven  (JD) 
Approach 
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Memory-Interference  Aware  Task  Allocation 


Observations 

•  Memory  interference  due  to  tasks  running  in  other  cores 

•  Tasks  running  on  same  core  do  not  interfere  with  each  other 

•  Collocate  memory-intensive  tasks  on  same  core 

Graph  G  =  (V,-,  £',■/):  Vj  =  t i,Eij  =  inter ference(ji,Tj) , 
weight(Eij)  = 

1  i  1  j 

Following  BFD: 

1.  Try  to  deploy  first  un-deployed  subgraph  on  bin  (core) 

2.  If  cannot 

•  break  graph  with  minimum  cut  (minimize  edge  weights) 

•  One  piece  that  fits  largest  gap  +  rest 

3.  Add  to  undeployed  subgraphs 
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Minimum-Cut  Memory  Interference  Packing 


Core  1 


Core  2 


Core  3 
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Minimum-Cut  Memory  Interference  Packing 


Core  1 


Core  2 


Core  3 
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Minimum-Cut  Memory  Interference  Packing 


Core  1 


Core  2 


Core  3 
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Minimum-Cut  Memory  Interference  Packing 


Core  1  Core  2 


Core  3 
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Minimum-Cut  Memory  Interference  Packing 


Core  1  Core  2  Core  3 
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Minimum-Cut  Memory  Interference  Packing 


Core  1  Core  2  Core  3 
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Minimum-Cut  Memory  Interference  Packing 


~  Software  Engineering  Institute 


Carnegie  Mellon  University 


Multicore  Real-Time  Scheduling 
May  18,  2015 

©2015  Carnegie  Mellon  University 


69 


Memory-Interference  Aware  Task  Allocation 
(MIAA) 


-BFDnB 

-BFDwB 

—£r 

-FFDnB 

-FFDwB 

^K-IA3nB 

— 1— 

-IA3wB 

-©- 

-MIAA 

IA3  M.  Paolieri,  E.  Qui~nones,  F.  Cazorla,  R.  Davis,  and  M.  Valero.  IA3:  An  interference  aware  allocation 
algorithm  for  multicore  hard  real-time  systems.  RTAS  2011. 
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Resource  Conflicts  for  Parallelized  Workloads 

Deadline 


Parallelization  / 

•  Computation  time  >  Deadline  V 

•  Must  parallelized  to  meet  deadline  ✓ 

•  Guarantee  always  finish  before  deadli 


Resource  interference  within  a  task 

•  Due  to  parallel  subtasks 

•  Need  to  share  memory  to  communicate 


Predictable  sharing 

•  Compatible  with  efficient  parallelized  task  schedulers 


Parallelized  Task  Scheduling 


Developed  a  staged  execution  model 


Scheduled  under  Global  Earliest-Deadline  First 

•  Most  efficient  scheduling  for  staged  execution 

•  If  task  schedulable  under  optimal  scheduler  our  scheduler  need 
at  most  twice  the  speed  to  schedule  task 
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Challenges  for  Parallelized  Task  Resource 
Management 

Intra-task  partitions 

•  Threads  with  different  sensitivities 

•  Assign  different  partitions  to  different  parts  of  same  tasks 

•  Down  to  different  colors  for  each  page  of  a  task 

Inter-task  shared  partitions 

•  Shared  partitions  between  parts  of  different  tasks 

Intra-task  memory  bus  interference 
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Hardware  and  Software  Profiling 


Hardware 

•  Mapping  of  memory  bits  for  cache  and  bank  index 

•  Randomization  strategies 

Software 

•  Bound  on  number  of  memory  accesses 

•  Temporal  and  spatial  locality  of  accesses 

•  Techniques 

•  Model  checking  (better  term?) 

•  Variable  placement  and  access 

•  Control-flow-based  temporal  and  spatial  locality 

•  Profiling 

•  Performance  counters 

•  Valgrind 
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Contact  Information 


Bjorn  Andersson 
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